智能论文笔记

Minimalist Data Wrangling with Python

Marek Gagolewski

分类：机器学习

2022-11-09

Minimalist Data Wrangling with Python is envisaged as a student's first introduction to data science, providing a high-level overview as well as discussing key concepts in detail. We explore methods for cleaning data gathered from different sources, transforming, selecting, and extracting features, performing exploratory data analysis and dimensionality reduction, identifying naturally occurring data clusters, modelling patterns in data, comparing data between groups, and reporting the results. This textbook is a non-profit project. Its online and PDF versions are freely available at https://datawranglingpy.gagolewski.com/.

translated by 谷歌翻译

A Framework for Benchmarking Clustering Algorithms

Marek Gagolewski

分类：机器学习 | (统计)机器学习

2022-09-20

可以通过在各种基准问题上运行聚类算法的评估，并将其输出与专家提供的参考，地面真实分组进行比较。不幸的是，许多研究论文和研究生论文仅考虑少数数据集。同样，很少有这样的事实，即可以考虑许多同样有效的方法来集中给定的问题集。为了克服这些局限性，我们开发了一个框架，其目的是引入一致的方法来测试聚类算法。此外，我们已经在整个机器学习和数据挖掘文献中汇总，抛光和标准化了许多聚集基准电池，其中包括不同维度，尺寸和群集类型的新数据集。交互式数据集资源管理器，Python API的文档，对与其他编程语言（例如R或MATLAB）互动的方式的描述，以及其他详细信息，以及其他详细信息，都在https://clustering-benchmarks.gagolewski.com上提供。

translated by 谷歌翻译

Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm

Marek Gagolewski , Maciej Bartoszuk , Anna Cena

分类：机器学习 | (统计)机器学习

2022-09-13

应用分层聚类算法所需的时间最常由成对差异度量的计算数量主导。对于较大的数据集，这种约束使所有经典链接标准的使用都处于不利地位。但是，众所周知，单个连锁聚类算法对离群值非常敏感，产生高度偏斜的树状图，因此通常不会反映出真正的潜在数据结构 - 除非簇分离良好。为了克服其局限性，我们提出了一个名为Genie的新的分层聚类链接标准。也就是说，我们的算法将两个簇链接在一起，以至于选择的经济不平等度量（例如，gini-或bonferroni index）的群集大小不会大大增加超过给定阈值。提出的基准表明引入的方法具有很高的实际实用性：它通常优于病房或平均链接的聚类质量，同时保持单个连锁的速度。 Genie算法很容易平行，因此可以在多个线程上运行以进一步加快其执行。它的内存开销很小：无需预先计算完整的距离矩阵即可执行计算以获得所需的群集。它可以应用于配备有差异度量的任意空间，例如，在实际矢量，DNA或蛋白质序列，图像，排名，信息图数据等上。有关R。另请参见https://genieclust.gagolewski.com有关新的实施（GenieClust） - 可用于R和Python。

translated by 谷歌翻译

Adjusted Asymmetric Accuracy: A Well-Behaving External Cluster Validity Measure

Marek Gagolewski

分类：机器学习 | (统计)机器学习

2022-09-07

没有，也不会有单一的最佳聚类算法，但是我们仍然希望能够确定那些在某些任务类型上表现出色并过滤掉系统令人失望的人。传统上，使用内部或外部有效性度量评估聚类算法。内部度量量化了所获得的分区的不同方面，例如簇紧凑性或点可分离性的平均程度。然而，他们的有效性是值得怀疑的，因为他们促进的聚类有时可能毫无意义。另一方面，外部措施将算法的输出与专家提供的基础真相分组进行了比较。常规的经典分区相似性分数，例如归一化的互信息，福克斯 - 马洛或调整后的兰德指数，可能没有所有期望的特性，例如，它们无法正确识别病理边缘病例。此外，它们不能很好地解释：很难说出0.8的分数。它的行为也可能随着真实簇的数量的变化而有所不同。这使得在许多基准数据集中比较聚类算法变得困难。为了解决这个问题，我们提出并分析了一种新措施：最佳设置匹配精度的不对称版本。它可以纠正机会和集群大小的不平衡性。

translated by 谷歌翻译

Are Cluster Validity Measures (In)valid?

Marek Gagolewski , Maciej Bartoszuk , Anna Cena

分类： (统计)机器学习 | 机器学习

2022-08-02

内部群集有效性度量（例如Calinski-Harabasz，Dunn或Davies-Bouldin指数）经常用于选择适当数量的分区数量，应将数据集分为二。在本文中，我们考虑如果将这些索引视为无监督学习活动中的客观功能会发生什么。关于轮廓指数的最佳分组是否真的有意义？事实证明，许多群集有效性指数促进了聚类，这些聚类与专家知识相匹配。我们还引入了邓恩指数的一个新的，表现出色的变体，该变体是建立在OWA操作员和接近邻居图的基础上的，因此，无论其形状如何，都可以更好地相互分离。

translated by 谷歌翻译

Crowd Score: A Method for the Evaluation of Jokes using Large Language Model AI Voters as Judges

Fabricio Goes , Zisen Zhou , Piotr Sawicki , Marek Grzes , Daniel G. Brown

分类：人工智能

2022-12-21

This paper presents the Crowd Score, a novel method to assess the funniness of jokes using large language models (LLMs) as AI judges. Our method relies on inducing different personalities into the LLM and aggregating the votes of the AI judges into a single score to rate jokes. We validate the votes using an auditing technique that checks if the explanation for a particular vote is reasonable using the LLM. We tested our methodology on 52 jokes in a crowd of four AI voters with different humour types: affiliative, self-enhancing, aggressive and self-defeating. Our results show that few-shot prompting leads to better results than zero-shot for the voting question. Personality induction showed that aggressive and self-defeating voters are significantly more inclined to find more jokes funny of a set of aggressive/self-defeating jokes than the affiliative and self-enhancing voters. The Crowd Score follows the same trend as human judges by assigning higher scores to jokes that are also considered funnier by human judges. We believe that our methodology could be applied to other creative domains such as story, poetry, slogans, etc. It could both help the adoption of a flexible and accurate standard approach to compare different work in the CC community under a common metric and by minimizing human participation in assessing creative artefacts, it could accelerate the prototyping of creative artefacts and reduce the cost of hiring human participants to rate creative artefacts.

translated by 谷歌翻译

On the Convergence of Policy Gradient in Robust MDPs

Qiuhao Wang , Chin Pang Ho , Marek Petrik

分类：机器学习

2022-12-20

Robust Markov decision processes (RMDPs) are promising models that provide reliable policies under ambiguities in model parameters. As opposed to nominal Markov decision processes (MDPs), however, the state-of-the-art solution methods for RMDPs are limited to value-based methods, such as value iteration and policy iteration. This paper proposes Double-Loop Robust Policy Gradient (DRPG), the first generic policy gradient method for RMDPs with a global convergence guarantee in tabular problems. Unlike value-based methods, DRPG does not rely on dynamic programming techniques. In particular, the inner-loop robust policy evaluation problem is solved via projected gradient descent. Finally, our experimental results demonstrate the performance of our algorithm and verify our theoretical guarantees.

translated by 谷歌翻译

Flowstorm: Open-Source Platform with Hybrid Dialogue Architecture

Jan Pichl , Petr Marek , Jakub Konrád , Petr Lorenc , Ondřej Kobza , Tomáš Zajíček , Jan Šedivý

分类：人工智能

2022-12-19

This paper presents a conversational AI platform called Flowstorm. Flowstorm is an open-source SaaS project suitable for creating, running, and analyzing conversational applications. Thanks to the fast and fully automated build process, the dialogues created within the platform can be executed in seconds. Furthermore, we propose a novel dialogue architecture that uses a combination of tree structures with generative models. The tree structures are also used for training NLU models suitable for specific dialogue scenarios. However, the generative models are globally used across applications and extend the functionality of the dialogue trees. Moreover, the platform functionality benefits from out-of-the-box components, such as the one responsible for extracting data from utterances or working with crawled data. Additionally, it can be extended using a custom code directly in the platform. One of the essential features of the platform is the possibility to reuse the created assets across applications. There is a library of prepared assets where each developer can contribute. All of the features are available through a user-friendly visual editor.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue

Daxin Tan , Nikos Kargas , David McHardy , Constantinos Papayiannis , Antonio Bonafonte , Marek Strelec , Jonas Rohnke , Agis Oikonomou Filandras , Trevor Wood

分类：自然语言处理

2022-12-07

Entrainment is the phenomenon by which an interlocutor adapts their speaking style to align with their partner in conversations. It has been found in different dimensions as acoustic, prosodic, lexical or syntactic. In this work, we explore and utilize the entrainment phenomenon to improve spoken dialogue systems for voice assistants. We first examine the existence of the entrainment phenomenon in human-to-human dialogues in respect to acoustic feature and then extend the analysis to emotion features. The analysis results show strong evidence of entrainment in terms of both acoustic and emotion features. Based on this findings, we implement two entrainment policies and assess if the integration of entrainment principle into a Text-to-Speech (TTS) system improves the synthesis performance and the user experience. It is found that the integration of the entrainment principle into a TTS system brings performance improvement when considering acoustic features, while no obvious improvement is observed when considering emotion features.

translated by 谷歌翻译